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Foreword 


The performance of a system as complex in concept as STRETCH 
is extremely difficult to evaluate. The operation time for each unit is 
basically dependent on the particular operation it is performing^ the 
performance of the system as a whole is dependent on a complex mix 
of the individual units and their interactions with one another. Further- 
more, the performance of the system with respect to a synchronous 
machine will vary, depending on the particular program which is being 
run, and no single simple measure of performance can be obtained. 

There are many independent criteria for determining machine perform- 
ance. Each user is ultimately interested in the ability of the machine to 
solve competently, in a feasible financial and technical fashion, the prob- 
lem or problems peculiar to individual requirements. While we do not have 
exhaustive information on total system performance, all the problem appli- 
cations that are available for release at this time are discussed in the fol- 
lowing pages. Other problems have been run, but IBM does not have 
permission at this time to distribute this data. 

The STRETCH remains, in our honest judgment, the most power- 
ful and potentially productive piece of computing machinery available in 
the world today. It represents a real challenge to those with problems 
heretofore unprocessable on older machines with anything like reason- 
able efficiency. It is a machine ideally suited for special large problems 
drawn from the areas of matrix multiplication and inversion, linear pro- 
gramming, Leontieff input/output models, three dimensional non-linear 
partial differential equations, and certain areas of simulation. The area 
of application potential is discussed in greater detail in the following 
pages. 
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The information in this report shows many aspects of the per- 
formance of the STRETCH system and contains examples of test pro- 
grams which have been run and timed. 


3/8/61 


1-2 



Programming 


7030 PROGRAMMABILITY 

Unlike earlier IBM data processing systems, the 7030 makes very 
extensive use of overlapped operations. While this overlapping adds 
significantly to the over-all speed of the machine, it does complicate 
the problem of writing optimized programs because the time taken to 
perform a given operation depends on whether or not the look-ahead 
feature has been able to locate and bring from memory the required 
data while previous operations were being performed. Thus, the order 
in which instructions are written can be important if an optimal program 
is required. In many instances the programmer can forget about such 
considerations without significant loss of speed. There are, however, 
some situations to which the programmer must pay special attention if 
maximum speed is to be attained. 

In order to help programmers use the 7030 system efficiently, a 
number of hints have been drawn up, and these are given below. The 
basis of these hints is twofold: a theoretical study of the 7030 logical 
organization, and limited practical experience of running the 7030 pro- 
grams. As more operating experience on the 7030 is gained, it must be 
expected that further rules or hints for good programming will be 
developed. 

Hints Towards Good 7030 Programming 

Generalities 

1 . Efficiency in computer problem solving involves the balancing 
of the following factors: 
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a. Accuracy of results 

b. Analysis effort 

c. Programming time 

d. Debugging time 

e. Production run time 

f . Effectiveness in repeated use of program (possibly by a 
stranger) 

The relative weights of these factors vary from problem to 
problem, individual to individual, and from installation to in- 
stallation. For small one-shot problems the trend is towards 
the emphasis on a, b, c, d. 

2. Timing is important for much traversed inner loops, but 
visually less important elsewhere, 

3. There are usually many ways of doing the same problem. 

4. The 7030 will not be efficiently used when the programmer 
tries to make it look like machine X. 

5. Advantage should be taken of special features in STRAP and 
MCP to minimize errors and to simplify debugging. 

6. Machine efficiency is gained by distributing the work over as 
many major units as possible so that at any given time no 
major imit is idle. 

7. Memory conflict can be largely removed by putting instructions 
and data in separate memory box groups. 

8. Information transmittal between autonomous major units is 
through buffer registers. The I-box buffers lY, 2y and the 
look-ahead buffer levels LAO, LAI, LA2, and LA3 should not 
be left empty over extended lengths of time. Nor should they 
be constantly crowded by data with little information content. 


3/8/61 


2-2 



9. It is perfectly permissible to use floating point operations on 
VFL quantities or binary operations on decimal quantities. 

Specifics 

1. Floating point operations are usually E-box limited in timing 
(exceptions L, LWF, DL, DLWF and ST). VFL operations in- 
volve extensive decoding and execution time, and are usually 
much slower than the floating point covmterparts. I-box op- 
erations usually do not involve the E-box, and the I-box time 
can be covered largely by nei^boring floating point operations. 

2. I-box fetches are less efficient than E-box fetches, since the 
latter are greatly enhanced by look-ahead buffering. 

3. SF, SC, SR are more time consuming than SX, for the latter 
does not call for an I-box fetch. 

4. Information transfer from the E-box to the I-box is relatively 
time consuming, but is still faster than, say from the E-box 
to main memory, then immediately from main memory to the 
I-box. 

5. Immediate operands require no fetch and are to be preferred, 
particularly for I-box operations. 

6. All VFL stores are fetch-and-stores. Every involves a 
fetch, VFL arithmetic, and a store. B(ind) for non-index in- 
dicators is similar to BB except that I-box activity is less ex- 
tensive, and the fetch-store involve an internal operand (SIND). 

7. VFL Information will be processed more efficiently if word 
boundary crossover is not present. Otherwise there will be 
over-exercising of the memory and LA.. 

8. Take advantage of forwarding, but avoid (if easily accomplished) 
other types of store-close-to-fetch. 
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9. Avoid consecutive stores since they are time consuming, as is 
forwarding more than once. The I-box otherwise would be 
standing still and LA gradually drained. 

10. All successful branch instructions will temporarily remove 
the I-box buffer. 

11. The following branches are considered unconditional by the I- 
box: 

CB and varients 

B(ind) for XF, XVLZ, XVZ, XVGZ, XCZ, XL, XE, XH 
and are performed correctly by the I-box. 

12. The following are considered by I-box to be truly conditional 
branches: 

B(ind) for non-index indicators 
BB 

I-box makes the tentative assumption that the branch is not 
successful, and processes ahead. If the assumption proved 
wrong, branch-recovery will be performed, which requires the 
cleaning of I-box, restoring of pre-processed index registers, 
and cleaning of LA before resumption of normal activities. 
Conditional branches should be largely unsuccessful, even if 
an additional (vmconditional) branch instruction has to be added 
to the program. 

13. BD, RNX, T and SWAP require cleaning of LA. 

14. Interruption involves cleaning of the I-box, restoring of index 
registers, execution of a pseudo B{ind) instruction, fetching, 
and execution of a free instruction before the resumption of 
normal activities. Judicious xise of this feature, however, al- 
lows the writing of inner loops with few time consuming branch 
instructions. 
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15. VFL instructions require from 2 to 6 levels of LA and involve 
the slower byte processing. The power of such instructions 
(particularly the logical connective instructions), however, 
frequently compensates for the slow speed. 

16. Begin a loop with a full-word address, even if a CNOP has to 
be placed just prior to the loop. This avoids repeatedly fetch- 
ing an instruction which is not used. 

17. For optimum speed, fetch-type I-box instructions should oc- 
cupy the second half of a fuU word. 

18. The following special registers are ham fide memory loca- 
tions, and are subject to the usual memory restrictions: 

0. ($Z), 4. (SMB), 13. ($RM), 14. ($FT), 15. (STR). 

The load factor instruction thus involves a fetch and a store. 

APPLIED PROGRAMMING SYSTEMS FOR THE IBM 7030 

Listed below are the systems programs which have been designed 
and are now being written and tested for the 7030 system. Their purpose is 
to provide efficient and productive use of the computer system, to permit 
applications to be programmed easily, and to assist IBM 704, 709, and 
7090 users in making the transition to the 7030. 

Briefly, the programs being provided are: 704, 709, and 7090 pro- 
grams for simulating the 7030 and assembling for it; programs for sim- 
ulating the 704 or 709 on the 7030; a master control program for 7030 
operations; and processors for STRAP symbolic, SMAC macro, and 
FORTRAN language programming. Programs will be released in field 
test version according to the schedule in Table 1. 
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IBM 7030 Packages for the 704, 709, and 7090. 

These packages permit programming to be partially checked in 
advance of 7030 installation. They consist of an assembly program, 
STRAP I, and a 7030 simulator. Versions are available for each of the 
three different machines: 704, 709, and 7090. 

“ Note “ “ 

STRAP I was designed in a joint effort by Los 
Alamos Scientific Computing Laboratory and 
International Business Machines Corporation 
personnel, and it was programmed by Los 
Alamos persoimel. 

STRAP I accepts all 7030 instructions plus some of the pseudo- 
instructions of STRAP n. All programs written for STRAP 1 are ac- 
cepted by STRAP II. The output of STRAP I can be executed directly 
on the 7030 or (via the simulator) on the 704, 709, or 7090. The speed 
of execution on the 704 or 709 is several thousand times slower than 
on the 7030. No attempt is made to simulate the timing details of I/O 
operations. 

IBM 704 and 709 Simulators for the 7030. 

These programs permit 704 and 709 programs to be run on the 
7030 without reprogramming. The simvilated machine has 32K mem- 
ory, 8 logical drums, 10 tape units, printer, punch, and operator con- 
sole. The console is simulated on the 7030 console and includes all 
features of the 704 or 709 console. 
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Note 


If the size of 7030 memory is reduced, the size 
of the simulated machine is also reduced. The 
following table obtains; 

7030 Memory 704 or 709 

16K 4K or 8K without drum 

32K 8K without drum 

48K 32K without drum 

65K Maximum configuration 

One tape per tape to be simulated. 

There is no attempt to simiilate CRT output. In addition, the PSE 
instructions to the printer exit hubs are handled according to the SHARE 
n board. Since there is no standard board for the punch, the PSE In- 
structions to the punch exit hubs are NOP's. Otherwise the simulation 
duplicates as closely as possible the actual performance of the 704 or 
709. 


Due to the different word lengths, it is necessary to pre- or post- 
process binary tapes to be commimicated between the 704 or 709 and the 7030. 
The 7030 programs to perform these operations are included as part of 
the package. 

In general, the simulator is about three times slower than the 704 
or 709. Floating point instructions are somewhat faster than this and 
fixed point Instructions somewhat slower. Because of the speeds of the 
7030 I/O units, programs can run faster on the 7030 than on the 704 or 
709 If they are I/O limited on those machines. 
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Master Control Program 

The master control program is an automatic operating system which 
runs job after job automatically. It plans the actual assignment of sym- 
bolic I/O units in advance, so as to minimize conflicts and delays between 
successive jobs, issuing tape mounting and demoimting instructions to the 
operator and checking (through reel labels) that tapes are mounted cor- 
rectly. For each job, MCP offers the options of COMPILE, GO, and COM- 
PILE and GO. Here COMPILE can refer to any of the several language 
processors listed below. MCP also arranges for checkout and post- 
mortem procedures where needed. Also, it is the agent by which further 
service routines (such as an installation logging routine) can be easily 
added to the operating system. 

In the individual program, MCP provides a complete input /output 
system. In particular, an option is provided for buffered operation of 
the card reader and high-density blocked input and output SPOOL tapes, 
permitting easy and efficient overlap of computing with the input of pro- 
gram and data and the output of results. Standard methods for reading, 
printing, and punching data are provided. Also, all interrupts are mon- 
itored by MCP. I/O interrupts are returned to the program in a form 
convenient to manipulate. The programmer can designate how maskable 
interrupts are to be processed. 

Including buffer areas, MCP occupies 8K of storage space. An 
IBM 1401 tape system is required for off-line operations. The time re- 
quired for the system to exercise its supervisory functions is not yet 
known in detail, but is estimated to be small compared with the time 
saved by its use. 
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STRAP n Assembly 

STRAP n is a symbolic programming system for the 7030. It 
defines a complete set of mnemonics for the 7030 instructions, together 
with the pseudo-operations for data definition and such assembly opera- 
tions as origin setting, space reservation, listing control, and identifica- 
tion of output. 

Features provided for are as follows: acceptance of programmer 
symbols up to 128 characters in length; acceptance of soxirce language 
numerical information written with radix 2 through 10, and 16; address 
arithmetic involving addition, subtraction, multiplication, and division; 
error messages on the output listing; extensive tailing facilities that 
permit up to 10 unique levels of tails to be appended to programmer 
symbols; and the option of saving the symbol table for subsequent pur- 
poses. 

STRAP n operates on a minimum size 7030 computer (24K plus 
disk). The first version operates independently; a later version will be 
adapted to operation by MCP. 

SMAC 

SMAC processes macro-instructions of the simple substitution 
type (but permitting macros within macros), thus adding the next higher 
level to the machine language of STRAP n. The result is a conveniently 
open-ended machine-oriented language. 

SMAC runs on a minimum 7030 and is operated by MCP. 

FORTRAN 

The FORTRAN processor handles the FORTRAN n statements. 
Any standard FORTRAN n program is accepted and converted into a 
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program sxiitable for further processing by SMAC. Much of the proces- 
sor is of a general purpose nature and is expected to be useful in other 
advanced programming systems the user may care to develop. 

The speed of compilation from FORTRAN to machine language is 
expected to be approximately twice that of the new 7090 FORTRAN now 
being developed by applied programming. The object jurograms which 
result are expected, for typical FORTRAN applications, to run at an 
average of 75 percent as fast as equivalent programs which have been 
carefully hand-coded. This degree of efficiency is obtained by including 
throu^out the processor -chain a large number of the criteria for ef- 
ficient 7030 programming listed earlier. Further such rules are likely 
to be discovered in the future, and the structure of the compiler is ex- 
pected to be flexible enough to accommodate most of them. 

FORTRAN runs on a minimum 7030 and is operated by MCP. 
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TABLE 1. STATUS OF 7030 PROGRAMMING SYSTEMS 


Systems 

Status as of 
March 1, 1961 

Approx. Number 
of Instructions 

Manual 

Availability 

Estimated Program 
Completion Date 

704/709/7090 

Package 

Operational 

24,000 

Available now 
as form 

No. C22-6531-1 

— 

704/709 

Simulator 

Coding 

completed 

5,000 

Aug. 1961 

Aug. 1961 

STRAP n 

Coding 

completed 

14,000 

Reference 

Manual 

April 1961. 

Operators 

Bulletin 

June 1961. 

July 1961 

MCP 

Coding 

completed 

8,000 

Preliminary 
edition of 
user's guide 
now available 
as form 

No. J22-6559. 

Reference 

Manual 

Oct. 1961 

Oct. 1961 

SMAC 

Coding 

completed 

4,000 

Sept. 1961 

Sept. 1961 

FORTRAN 

Coding in 
progress 

50,000 

Preliminary 

Bulletin 

July 1961 

March 1962 
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3 

Execution Times 


RAW SPEEDS AND THEIR INTERPRETATIONS 
E-Box Times for Basic Instructions 

Floating Point Instruction Times 

Followii^ is a summary of "raw" floating point execution. The 
times are predicated upon the availability of data and instructions when 
needed; that is, the times given are the maximum speed at which the 
floating point unit may operate. If some memory access (look-ahead, 
I-box, etc.) factors enter such that data and instructions are not avail- 
able when the floating point unit is able to accept them (or requires 
them), the extra delay thus caused is added to the times below. 

The times are broken up into two types of cycles; pre- execution 
and execution, defined as follows: 

• Pre-Execution — That part of the instruction which may be 
executed before any modification of an addressable register 
occurs. Floating point instructions are begun as soon as 
data paths are free and the instruction and initial addressed 
operand is made available. This time will overlap checking 
of a previous instruction, indicator settii^, interrupt testing, 
memory storage, etc. These cases may lead to a condition 
such that the operation should never have been started (inter- 
rupt occurs as result of previous instructions). In such a 
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case the pre-execution is terminated and no addressable 
registers would be modified. (As far as programmer is con- 
cerned pre-execution never occurred.) 

• Execution — The part of the instruction which follows the 
initial modification of am addressable register up to the point 
when the next floating point instruction may begin. 

The times are listed (table 2) in terms of the basic machine cycles, 
which is presently 0.3 microsecond. 
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TABLE 2. FLOATING POINT INSTRUCTION TIMES 


Instruction 

Pre-execution 

Execution 

■^(Add) 

+MG (Add to Mag. ) 

M+ (Add to Mem. ) 

M+MG (Add Mag to Mem. ) 

K (Compare) 

KMG (Compare Mag. ) 

KR (Compare for Range) 

KMGR (Compare Mag. for Range) 

4 cycles (pre shift ^ 3) 

1 cycle for each 

Additional pre- shift of 4 

1 cycle (norm £ 6) 

2 cycles for each additional 
norm, of 6 

2 cycles if recomplement 
is necessary - note 1 

2 cycles if forced zero 
occurs as result of "m^." 
operation. 

Z)+ (Add Double) 

D+MG (Add Double to Mag. ) 

F+ (Add Fraction) 

Same as Add 

Class 

Same as Add class 
plus 

1 cycle if immediate 
next instruction is floating 
point 

L (Load) 

LWF (Load with Flag) 

LFT (Load Factor) 

ST (Store) 

2 cycles 

1 cycle (norm 6) 

2 cycles for each additional 
norm, of 6 

DL (Load Double) 

DLWF (Load Double with 

Flag 

2 cycles 

1 cycle (norm 6) 

2 cycles for each additional 
norm of 6 

1 cycle if immediate 
next instruction is floating 
point 

SRD (Store Rounded) 

3 cycles 

1 cycle (norm < 6) 

2 cycles for each additional 
norm of 6 

SLp (Store Low Order) 

2 cycles 

17 cycles - unnormalized 

15 cycles if normalized 
and there are no leading 
zeros in intermediate 
fraction 

2 cycles for each norm, 
of 6 or less 
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TABLE 2. FLOATING POINT INSTRUCTION TIMES (cont’d) 


Instruction 

Pre-execution 

Execution 

SEF (Shift Fraction) 

Applies to both left and 
right shift 

4 cycles (shift 3) 

1 cycle for each 
additional shift of 4 

1 cycle 

1 cycle if immediate 
next operation is floating 
point 

E+ (Add Exponent) 

E+I (Add Immed, to Exp. ) 

- note 2 

1 cycle 

5 cycles (norm 6) 

2 cycles for each additional 
norm, of 6 

1 cycle if immediate 
next operation is floating 
point 

^(Multiply) 

5 cycles 

4 cycles (norm. < 6) 

2 cycles for each additional 
norm, of 6 

D* (Multiply Double) 

5 cycles 

4 cycles (norm 6) 

2 cycles for each additional 
norm, of 6 

1 cycle if immediate next 
operation is floating point 

*-¥(Multiply and Add) 

3 cycles 

cannot enter execution 
cycles until Factor is 
available 

17 cycles (pre shift 3 
and norm ^ 6) 

1 cycle for each additional 
pre- shift of 4 

2 cycles for each additional 
norm, of 6 

2 cycles if recomplement 
is necessary - note 1 

1 cycle if immediate next 
operation is floatii^ point 

R/ (Reciprocal Divide) 

2 cycles 

2 cycles plus 
all pre -execution and 
execution cycles of Divide 
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TABLE 2. FLOATING POINT INSTRUCTION TIMES (cont'd) 


Instruction 

Pre- execution 

Execution 

/(Divide) 

2 cycles (divisor shift ^ 6) 

2 cycles (dividend shift ^ 6) 


2 cycles for each 

2 cycles for each additional 


additional divisor shift 

dividend shift of 6 or total of 


of 6 

3 cycles if dividend is all zero 

2 cycles for initial reduction 
loop if dividend is normalized 

3 cycles for initial dividend 
pass if dividend has leadir^ 



zeros. 



2 cycles for each additional 
reduction loop (number of 
cycles is data dependent). 

2 cycles if final remainder 
has to be complemented 

1 cycle - all 


If zero divisor 

3 cycles pre-execution and 

2 cycles execution instead of 
previous 

D/ (Divide Double) 

Same as Divide 

Same as Divide 



plus 

6 cycles (remainder norm :£ 6) 
2 cycles for each additional 
remainder norm, of 6 

SRT (Store Root) 

2 cycles 

106 cycles (norm. ^ 6) 

1 cycle if operation began 
with a "B" pulse 

2 cycles for each additional 



norm of 6 
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NOTE 1 - When fraction signs are unlike, the operand which is pre- 
shifted will be complemented. Recomplementing will only 
occur if the complemented (following pre-shift) fraction 
is larger in magnitude than unshifted fraction. 

If fraction signs are alike no complementing occurs. 

NOTE 2 - Add Exponent and Add Immediate to Exponent will most 

likely be changed in the near future to three pre-execution 
and three execution cycles. 


Serial Arithmetic Execution Times 

The execution times of SAU instructions can be presented in 
terms of some basic equations shown in table 2 below. 

The column headings are defined as follows: 

• Operation Code - instruction abbreviations. 

• Pre-execution Time - time required to decode operation and 
set up controls. 

• Execution Time - time required to perform the instructed 
function. 

• Termination Time - time required to set indicators and clear 
unit. 

• Full Word Total - a computed time in microseconds for the 
operations using unsigned full word operands which produce 
no arithmetic carries. 

• Comments - variations or additions to the execution time 
equations. 

All equations were developed for unsigned operations. Signed 
operations can be computed by adding .6 microseconds to the pre- 
execution time. An additional .6 microseconds must be added if the 
result of the operation requires complementing. The following opera- 
tions require no additional time when signed: SRD, C, CM, CT, CV, 
DCV, LCV (D-B). LTRCV (D~B),-LTRS, LFT . 


3/8/61 


3-6 



3/8/61 3-7 


TABLE 3. EQUATIONS FOR SAU INSTRUCTIONS 


Operation 

Code 

Pre-execution 

Time 

Execution 

Time 

Termination 

Time 

Full Word 
Total 

Comments 


1.8 


.6 

7.2 us 

Unlike signs CD > AB, add .6(^+ ~ + Z + 1) 
to the execution time. ^ ^ 

+MG 

1.8 

.6(|.z) 

.6 

7.2 us 

Unlike signs CD > AB, add .6 to the 
execution time. 

L 

1.8 

• 6(-) 
y 

.6 

7.2 us 


LWF 

1.8 

.6(-) 

y 

.6 

7.2 us 



1.8 

•6f) 

.6 

7.2 us 

Unlike signs AB > CD, add . 6 (- + 1) to the 
execution time. ^ 

mmc 

1.8 

•6(f) 

,6 

7.2 us 

Unlike signs AB > CD, add . 6(5 + 1) to the 
execution time. ^ 

MM 

1.8 

.6(c + 1) 

.6 

3.0 us 

Unlike signs, execution time = . 6 ( y ) when 
true result or . 6 ( 2 ^ + 1) when coniplement 
result. ^ 

ST 

1.8 

.6(-) 

y 

.6 

7.2 us 


SRD 

3.0 

.6(-) 

y 

.6 

8.4 us 

No additional time required when operation 
is signed. 

K 

1.8 

.6(-+ -) 
y r 

.6 

7.2 us 


KF 

1.8 

.6(-) 

y 

.6 

7.2 us 


KE 

1.8 

.6(f + 

y r 

.6 

7.2 us 


KFE 

1.8 

. 6 ( 5 ) 

y 

.6 

7.2 us 
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TABLE 3. EQUATIONS FOR SAU INSTRUCTIONS (cont'd) 


Operation 

Code 

Pre-execution 

Time 

Execution 

Time 

Termination 

Time 

Full Word 
Total 

Comments 

KR 

L8 


1 

.6 

7.2 us 


KFR 

1,8 

.6(^) 

.6 

7.2 us 


c 

1.8 

.6(^) 

.6 

7.2 us 


CM 

1.8 

.6(^) 

y 

.6 

7.2 us 


CT 

1.8 

.6(f) 

.6 

7.2 us 


LTRS 

2.4 

.6(f+3) 

.6 

9.6 

If the effective FL is greater than 48 bits, 

add .6(t -1) to the execution time. 

8 

LFT 

2.4 

.6(f) + 3) 

.6 

9.6 

If the effective FL is greater than 48 bits, 
then add .6(^ ~1) to the execution time. 

B - D 
LCV 

2.4 

• 6(x + g- + 1) 

.6 

45.3 


D ~ B 
LCV 

2.4 

O/X t u .. 

•^^7 ^ f 8 + 

.6 

14.7 


B - D 
LTRCV 

2.4 

.6(x + 3) 

.6 

29.4 

If the converted field length is greater 
than 48 bits, add .6(^^c -1) to the 
execution time. ® 

D - B 
LTRCV 

2.4 

.6(f + 16) 

.6 

21.6 

If tlie field length to be converted after all 
zone bits have been removed is greater 
than 48 bits, add .6(^ -1) to the execution 
time. ° 

B ~ D 

CV 

2.4 

.6(s + g ^ ^) 

.6 

37.2 





Operation 

Code 

Pre-execution 

Time 

Execution 

Time 

Termination 

Time 

B - D 
DCV 

2.4 

.6(s +1+1) 

.6 

D - B 

CV 

2.4 

.6(j + 1) 

.6 

D - B 
DCV 

2.4 

•6(| + 2) 

.6 

* 

6.0 

3.9 

.6 

* f 

7.8 

3.0 + exec, 
time 

.6 

/ 

5,4 

Q 

.6 


Full Word 
Total 


Comments 


63.0 
10.8 
18.6 

10.5 

18.6 

24.0 ^ 

A = 1. Zero DD .DDL_^^ 

Ml, »«Qt» 

2, Leading zeroes -(1 + - — + 

6 

DDL - DRL - #L 

4 ' 

3. Complement Result. -Add 1 to A2. 




Glossary of Terms 


DD 
DR 
DDL 
DRL 
L '’O” 

#L "O" 

DDL or DRL 

OS 

FL 

BS 

B - D 
D - B 
R 
S 

T 

U 

V 

w 

X 

Y 
Z 


C 


Dividend 

Divisor 

Dividend length 

Divisor length 

Leading zero’s 

Number of leading zero’s 

Dividend or divisor length which ever is 

greater 

Offset 

Field length 

Byte size 

Binary to decimal 

Decimal to binary 

8 in Binary - 4 in decimal 

Accumulator field specified by the offset 

minus all leading zero’s 

Field length less all zone bits 

Result field 

96 in binary - 92 in decimal 

The number of accumulator significant bits 

greater than the field length 

Field length in unsigned operations - field 

length minus the byte size in signed operations 

8 in binary operations and byte size in decimal 

operations ^ 

One for a carry out of the last y byte plus one 
for a carry out of each succeeding 8 bit byte 
in binary or 4 bit byte in decimal 
The number of times a carry results from an 
8 bit byte in binary or a 4 bit byte in decimal 


Timing of I-Box Instructions 

The followii^ tables list the I-box times for decoding for execu- 
tion of I-box instruction. 
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TABLE 4. I- BOX TIMES - DIRECT INDEX 


Instruction 

Address in EM 


Address in XS 


Address in IR 


LX 

4.8 /isec + dec 

4.8 

Msec + dec 

4.2 

Msec 

4- dec + LAdr 


LV,LC,LR 

4.2 \isec + dec 

4.2 

Msec + dec 

3.6 

Msec 

4- dec + LAdr 


SX 

1,2 Msec + dec + LAAR 

4.2 

Msec + dec 

1.2 

Msec 

+ dec + LAAR 


SV, sc, SR 

4.2 Msec 4- dec + LAAR 

5.4 

Msec 4- dec 

3.6 Msec 
4 - LAAR 

+ dec 4- LAdr 


SV,SC,SR to TC 


6.0 

Msec 4- dec 4- LAdr 





V + 

4.2 Msec 4- dec 

4.2 

Msec 4- dec 

3.6 

Msec 

4- dec 4- LAdr 


V+C 

4.2 Msec 4- dec 

4.8 

Msec 4- dec 

4.2 

Msec 

+ dec 4 - LAdr 


V ^CR (EM) 

8.4 Msec 4- dec 

9.0 

Msec 4- dec 

8.4 

Msec 

4 - dec + LAdr 


F+ CR (XS) 

7.2 Msec 4- dec 

7.8 

Msec 4- dec 

7.2 

Msec 

-4 dec 4- LAdr 


KV, KC 

4.2 Msec 4- dec 

4.2 

Msec 4- dec 

3.6 

Msec 

4- dec -4 LAdr 


RNX 

9.0 Msec 4- dec + LAdr 
+ LAAR 







LVE 

7.8 Msec + dec 

6.6 

Msec 4- dec 

6.6 

Msec 

4- dec LAdr 


LVE as object 
INSN of LVE 

Additional 

4.8 Msec 

3.6 

Additional 

Msec 

3.0 

Additional 

Msec 4- LAdr (4.2 if 

LAMT) 

SVA 

4.2 Msec + dec 4- LAAR 

5.4 

Msec 4- dec 

3.6 

Msec 

+ dec + LAdr + 

LAAR 

SVA to TC 


6.0 

Msec 4- dec 4- LAdr 





ABBREVIATIONS: 

dec = decade 

LAAR = Look-ahead address register 

LAdr = Look-ahead drain 

LAMT = Look-ahead empty 

TC = Time clock 






TABLE 5. I-BOX TIMES- IMMEDIATE INDEX 


Instruction 

Basic Time 

Variations 

LVI, LCI, LRI, LVNl 

2.4 Msec 4- dec 


4 - 

1 

2.4 Msec + dec 


V +/C, V -IC 

3.0 Msec 4- dec 

Also V ± ICR where 
Count ^ 0 

VA-ICR, V - ICR 

7.2 Msec 4- dec 

1.2 Msec less if 
Refill from X5 

C^l, C - I 

2.4 Msec 4- dec 


KVL KVNI, KCl 

2,4 Msec 4- dec 


LVS 

5.4 Msec 4- dec 

Additional 

1.8 Msec per ADD 
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TABLE 6. I-BOX TIMES - MISCELLANEOUS INSTRUCTIONS 


Instruction 

Address in EM 

Address in XS 


Address in IR 

R (from EM) 

7.8 jusec + dec + LAAR 

7.2 Msec + dec 

6.6 

Msec + dec + LAAR + LAdr 

R (from XS) 

5.4 Msec + dec + LAAR 

5.4 Msec + dec 

4.2 

Msec + dec + LAAR + LAdr 

RCZ (C ^ 0) 

4.2 Msec + dec 

3.0 Msec + dec 

3.0 

Msec + dec + LAdr 

Z 

1.2 Msec + dec + LAAR 

3.6 Msec + dec 

1.2 

Msec + dec + LAAR 


EX 1. Decode time, plus 

2. Always begins with Look-ahead drain, plus 

3. Instruction fetch time as follows: 


Address 

Time 

0.0, 0.32 

4.8 

/jtsec 

1.0, 1.32 

6.0 

Msec 

2.0, 2.32 

7.8 

Msec 

3.0, 3.32 

7.8 

Msec 

4.0, 4.32 

7.8 

Msec 

5.0 —11.32 

11.2 

Msec 

12.0, 12.32 

7.8 

Msec 

13.0 —14.32 

5.4 

Msec 

15. D, 15.32 

4.8 

Msec 

16.0 —30.32 

4.2 

Msec 

31.0, 31.32 

6.0 

Msec 

32.0 —up 

5.4 

Msec 


4. Add decode and execution time for given instruction 

5. Always ends with LA drain, plus 

6. Add 3.0 psec for fetch of next instruction 
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TABLE 6. I-BOX TIMES - MISCELLANEOUS INSTRUCTIONS (cont'd) 


EXIC 


1. Decode time, plus 

2. Always begins with a Look-ahead drain 

a) 4.2 p.sec to fetch the psuedo- instruction counter are overlapped with 
the Look-ahead drain if the ps IC is in EM. 

b) 3.0 ptsec to fetch the psuedo-instruction counter are overlapped with 
the Look-ahead drain if the ps IC is in XS. 

3. Instruction fetch time = 6.0 /isec 

4. Stepping of psuedo-instruction counter 

a) Add LAAR time if psIC in EM 

b) Add .6 Msec time if psIC in XS 

5. Add decode and execution time for given instructions 

6. Always ends with LA drain, plus 

7. Add 3.0 jusec for fetch of next instruction 


TABLE 7. I-BOX TIMES - UNCONDITIONAL BRANCH 


Instruction 

Basic Time 

SIC to EM 

SIC to XS 

B 

3.6 Msec + dec 

4.8 Msec + dec + LAAR 

4.2 Msec + dec 

BR 

4.8 Msec + dec 

6.0 Msec + dec + LAAR 

5.4 Msec + dec 

BE 

3.6 Msec + dec + LAdr 
(if previously disabled) 

4.8 Msec + dec + LAAR + 

LA dr (if previously dis- 
abled) 

4.2 Msec + dec 
+ LAdr (if pre- 
viously disabled) 

BEW 

Setup time as above on 

BE 


NOP 

1.2 Msec + dec 

Same as Basic 

Same as Basic 

BD 

4.8 Msec + dec + 

2 LA drains 

6.0 Msec + dec + 

2 LA drains 

5.4 Msec + dec + 

2 lA drains 
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TABLE 8. I-BOX TIMES - INDEX BRANCHES 


Instruction 

Basic Time 

SIC to EM 

SIC to XS 

CB (Successful) 

4.2 ^isec + dec 

6.6 ^sec + dec + lAAR 

7.2 ixsec + dec 

CB (Unsuccessful) 

3.6 /Lisec + dec 

3.6 ixsec + dec 

3.6 ixsec + dec 

CBR (EM) (Succ) 

6.6 jbtsec + dec 

9.0 /Ltsec + dec H- LAAR 

9.6 ixsec + dec 

CBR (XS) (Succ) 

6.0 ptsec + dec 

8.4 jisec + dec + LAAR 

9,0 ixsec + dec 

CBR (EM) (Unsucc) 

6.0 pLsec + dec 

6.0 fisec + dec 

6.0 ixsec + dec 

CBR (XS) (Unsucc) 

5.4 iisec + dec 

5.4 ixsec + dec 

5.4 ixsec + dec 


NOTE: CBR behaves like CB if Refill is not to be taken. 
Ex: CBR, branch on count ^ zero 


TABLE 9. 1-BOX TIMES - TRANSMIT INSTRUCTIONS 


Instruction 

Setup time 

Loop time 

Termination time 

T (EM-*EM) 

1.2 usee + dec + LA dr 

3.0 usee 

3.6 usee 

T (EM-^XS) 

1,2 usee + dec + LA dr 

4.2 usee 

.6 usee 

T (XS ~*EM) 

1.2 usee + dec + LA dr 

3.0 usee 

3.6 usee 

T (XS -*XS) 

1.2 usee + dec + LA dr 

4.2 usee 

.6 usee 

T (IR -*EM) 

1.2 usee + dec + LA dr 

7.2 usee 

3.6 usee 

T (IR -^XS) 

1.2 usee + dec + LA dr 

7.2 usee 

.6 usee 


Note: Transmit to EM and IR identical in time. 
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TABLE 9. I-BOX TIMES - TRANSMIT INSTRUCTIONS (cont'd) 


Instruction 

Setup time 

Loop time 

Termination time 

S (EM —EM) 

1.2 usee + dec 

+ LA dr 

6.0 usee 

1.8 usee 

S (EM —XS) 

1.2 usee + dec 

+ LA dr 

6.6 usee 

1.8 usee 

S (EM —IR) 

1.2 usee + dec 

+ LA dr 

9.6 usee 

1.8 usee 

S (XS —EM) 

1.2 usee + dec 

+ LA dr 

6.0 usee 

1.8 usee 

S (XS ^XS) 

1.2 usee + dec 

+ LA dr 

7.2 usee 

1.8 usee 

S (XS -IR) 

1.2 usee + dec 

+ LA dr 

10.8 usee 

1.8 usee 

S (IR -EM) 

1.2 usee + dec 

+ LA dr 

9.6 usee 

1.8 usee 

S (IR —XS) 

1.2 usee + dec 

+ LA dr 

10.2 usee 

1.8 usee 

S (IR -IR) 

1.2 usee + dec 

+ LA dr 

15.0 usee 

1.8 usee 

Note: The immediate /direct and the forward/backward options do not 
affect the timing of either transmit or swap. 


Analysis and Interpretation of the New Speeds Relative Machine Environment 

Machine Organization 

In conventional machines the instruction time is dependent upon the 
total length and delays along information paths, and the hardware places a 
severe limitation on performance. 

The organization of the 7030 has been devised to free the machine from 
such limitations. To a large extent, the number of obstacles along the in- 
formation path is not of crucial importance; the speed and performance of 
the machine is governed by the average frequency of information access. 

For example, a 7030 box has a readout time of 1 ns and a recycle time of 
2.2 jus, yet in the 7030 system information can be obtained at the rate of 
one word every 0.3 /ns. 
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The 7030 system organization is characterized by the local autonomy 
of the major units: the memory bus control imit (MBCU), the I -box, the 
look-ahead (LA), the E-box, the exchange, and the disk exchange. (For defi- 
nitions of these terms the reader is referred to Appendix A.) 

Each major unit is responsible for processing at top speed as long as 
there is data to process. The local autonomy of the MBCU, exchange, and 
disk exchange means that the central processing unit (CPU) can operate in- 
dependent of I/O operations. Within the CPU this local autonomy means 
that temporary delays up to several jis in one unit can be tolerated without 
slowing down the entire pipe line. Extensive buffering of information is 
needed to absorb such temporary delays, and within the 7030 CPU at any 
given time up to ten instructions can be in various stages of processing. 
Within wide limits, the times for instruction processing is not the sum of: 
Instruction fetch, instruction error check, decoding, operand fetch, oper- 
and error check, and execution but is the maximum of these times aver- 
aged over several instructions. 

The E-box times of the instructions listed in the previous section 
are the times realizable if the LA levels (which are buffers to the E-box) 
can always supply needed information to the E-box. The machine organi- 
zation is such that this is usually the case. 

For such a loosely- coupled machine the information paths are ac- 
tually longer and the number of obstacles larger than conventional ti^tly- 
organized machines. It is possible to create situations to make the infor- 
mation path time influence instruction time. All of these situations, so 
far as the CPU is concerned, have an effect on the LA buffering, which in 
turn affects E-box performance. 
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Effect of Memory Interleaving 

With the instruction buffers lY, 2Y (capable of housing four half- 
word instructions), in the I -box, and the four LA levels, the advantage 
of memory interleaving is fully ejqiloited. 

The demand of the E -box on LA is usually no higher than one word 
per 1.5 (MS (sequence of floating adds) and on the average, the memory is 
not a factor. In terms of the four -level look-ahead, the requirement is 
satisfied if any four E-box demands are fulfilled in 6 ^ls. The memory 
interleave scheme allows four words every 2.2 jLts (four box interleaving) 
or four words every 4.4 p,s (two box interleaving). 

In actual computations, with the instructions occupying the two lower 
memory boxes and data occupying the four upper memory boxes, memory 
conflict is not ejqjected to be an important factor, even if the I/O units are 
in full operation. Delays due to repeated demands of the same memory 
box are ejqiected to be quite infrequent. 

There are conflicts due to fetches and stores, independent of mem- 
ory access. These are related to the machine measures at preserving 
the logical integrity of a program sequence and will be discussed else- 
where. 

Overlapping of Decoding 

The I-box decoding, address indexing, and operand request continues 
as long as instructions are available and as long as LA levels are avail- 
able for loading. For a sequence of floating point instructions the decoding 
rate is one instruction per 1.2 /xs. With few exceptions, this is faster than 
the execution rate. Thus the decoding time for average floating point se- 
quences is completely overlapped by concurrent E-box execution time. In 
other words, floating point instructions are E-box limited. Address index- 
ing has no measurable effect on the tinning of floating point instructions. 
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VFL instruction decoding is much more complicated. The minimum 
time is 3.6 fis which includes the loading of two LA levels. Each addition 
level requires an additional 0.6 /iS, and each indexing operation requires 0.6 
IJ.S. Of course, there may be further slow down m the decoding process 
if the LA levels are not available for loading, or if the second half of the 
instruction is not available when needed. 

The decoding time of I~box instructions is 0.6 fis. Since these in- 
structions are executed in the I-box, the decoding time has been included 
in the execution times. The processing time of I-box instructions can be 
largely overlapped by concurrent E-box action, since their E-box tkne. is 
only 1.2 IIS, 

Whenever a branch instruction is successfully executed, the pre- 
fetched instructions in lY, 2y must be replaced by new instructions. The 
latter have to be checked prior to use, and the decoding of the next instruc- 
tion will be delayed. This delay has been taken into account in the timing 
given. Again, concurrent E-box action can overlap much of this. 

Look-Ahead Levels 

In order to use the look-ahead (LA) levels efficiently, they must not 
be allowed to be empty over extended lengths of time, nor should they be 
crowded with data with little information content. 

The following instructions empty the LA completely: T, TI, SWAP, 
SWAP I, RNX, BD, 

All I-box instructions load LA levels for index register recovery 
or for indicator register updating. These levels are not useful to the 
E-box and have the effect of reducing the number erf LA levels. 

FPL data with word boundary crossover represent inefficient use of 
LA levels. 
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Frequent demands on LAAR (such as in consecutive STORE instruc- 
tions)and prolonged decoding delays in the I-box in general often lead to a 
half-empty look-ahead. 

When the number of effective LA levels are reduced, the E-box may 
become idle. The I-box processing time can then no longer be absorbed. 

The I- Checker 

The I-Checker is shared between the I-box and LA. It processes in- 
formation in 0.6 (IS and is used for the following functions: 

1. Instruction word error check with concurrent ECC to I-parity 
conversion. 

2. I-box data error check with concurrent ECC to I-parity conver- 
sion. 

3. Passage of data and VFL operation code from I-box to LA, with 
concurrent I-parity to LA -parity conversion. 

4. E-box fetch operand error check and check code conversion. 

5. Store operand error check and check code conversion. 

6. As part of data path between LA and I-box during, say, branch 
recovery. 

7. As part of I-box internal data path. 

8. As part of LA internal data path. 

The great majority of the demands on the I-Checker is due to 1, 2, 

3, 4, and 5 above. It is therefore conceivable that I-Checker conflicts may 
occur. The situation has not been completely studied (because of the dif- 
ficulty in subjecting the variables to program control), but it does not seem 
to have had much effect on floating point operations. When an I-Checker 
conflict occurs, the processing of one piece of data may have to be delayed 
by 0.3 to 0.6 ^s. 
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The LAAR, Stores and Internal Operand Fetches 

All "to-memory” operations involving the main memory from the 
7030 CPU are prepared by the I -box and accomplished by the look-ahead. 
I/O stores are, however, performed directly between the MBCU and ex- 
change units. 

The look-ahead address register (LAAR) is created for the purpose 
of containing the store address. LA levels are made available for the 
store operand. 

In the case of I-box store t 3 ^e instructions {SX, SV, SC, SR, SVA, 

R, RXZ, T, SWAP, etc.) the store operands are already available during 
I-box processing time and can be converted to ECC check bits during 
shipment to LA, simplifying the store action in LA. In practically all 
other cases the store operand will not be available during I-box proc- 
essing. In any case, the LAAR will remain ’’busy” from the time of 
look-ahead loading until the proper operand is available, is fetched, 
checked, and accepted by the MBCU. 

In an instruction, the address may refer to the main memory, in- 
dex register storage, or an internal register. The I-box, upon decoding 
an instruction whose fetch operand is an internal register, uses the 
LAAR to store the internal operand address. This is because the needed 
operand is not available during the decoding stage, and the mechanism 
for internal operand fetch already exists for the handling of store instruc- 
tions. Unlike a standard store, no memory bus request is needed, and the 
LAAR is freed sooner. 

(By an internal operand address is meant address 3, or an address 
between 5 and 12 inclusive. $Z, SlA, SMB, and SRM, SFT and JTR are 
honafide main memory locations. SIT , STC and SO through S15 are in 
index storage.) 
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Since there is only one LAAR, if the latter is busy whenever the 
I-box decodes an instruction requiring LAAR, a wait must occur imtilthe 
LAAR is free for reloading. Therefore, LAAR -requiring instructions 
should preferably be reasonably far apart, ideally with three or more 
time consuming instructions in between to ensure smooth I-box decoding. 
Measurements on consecutive floating-point -to-memory operations {ST, 
MT, LFT, etc.) do not therefore yield realistic timing information. On 
the other hand, the placement of such instructions is usually beyond the 
programmers control. 

A store into index storage is a time-consuming operation. When- 
ever such an instruction is decoded, since the next instruction may make 
use of the new index contents , the I-box does no further decoding until the 
new index contents arrives. 

Store Close to Fetch 

Whenever I-box requests a memory fetch, a comparison is made 
with the contents of LAAR to avoid logical conflicts. In all cases, logi- 
cal conflicts will not produce wrong results , although some delays are to 
be ejq)ected. 

One such conflict is a store into a location corresponding to a pre- 
fetched instruction, such as in the sequence: 

A ST (u), A +.0.32 

* + (n), 1000. (S5) (1) 

The second instruction, having been previously fetched into 1 Y 
or 2Y, clearly cannot have the correct information until the store is 
complete. When such a store is decoded, I-box invalidates the prefetched 
instructions and waits until the correct information is available. 
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Another conflict is a store into a nearby I-box operand address: 

ST (u), 1003, ($12) 

V +, $15, 1003. ($12) (2) 

The execution of the second instruction is delayed until the correct 
operand arrives. 

A third such conflict is a store into a nearby E-box operand address: 

ST (n), 1007. 

* (n) , 1007. (3) 

The I-box comparison with the LAAR shows that the operand will be 
in the LA prior to its arrival at the memory. A forwarding mechanism 
is activated to make the store operand available to the fetch level. Further 
LA loading in the I-box is delayed until the forwarding is completed. 

Forwarding is available also for consecutive fetch of the same mem- 
ory operand. Whenever a fetch-type instruction is decoded, the fetch ad- 
dress is gated into LAAR unless the latter is busy. This act does not 
make LAAR busy but can allow the forwarding on successive fetches. 

I-box Instructions 

The instructions executed by the I-box can largely be absorbed in an 
E-box limited environment. The following points, however, are to be noted: 

1. Some I-box instructions require the emptying of the LA. Con- 
current E-box operations would not be possible. 

2. Each I-box instruction results in one (sometimes more) level 
of LA being loaded. This is done to enable the convenient up- 
dating of certain indicators and to allow interruption action (if 
needed) to take place in step with other interruptions. The 
loaded level may be an index recovery level containing the old 
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contents of an index register to enable the restoration in case of 
unexpected interruptions. It takes 0.9 to 1.2 ps to process an 
LA level corresponding to I-box instructions. This time cannot 
be overlapped by concurrent E-box action. 

3. The E-box overlapping cannot be effective imless I-box instruc- 
tions are well dispersed. 

4. I-box operand fetches are made when the instruction is decoded. 
There is no look-ahead buffering to reduce the effective fetch 
time. I-box immediate instructions, not requiring memory ac- 
cess, are therefore faster than the ’’direct index arithmetic in- 
structions”. 

5. Some I-box instructions require more than one memory refer- 
ence. 

6. All successful branch instructions require the fetch of new instructions. 
The instructions previously buffered into lY and 2Y are invalidated. 

7. The treatment of some conditional branches requires extensive 
E-box action and will be discussed in the next section. The fol- 
loAving branches, however, involve conditions known to the I-box 
and are executed entirely within the I-box: 

CB and variants 

BindioT XF, XVLZ. XVZ. XVGZ, XCZ, XL. XE, XH. 

Recovery Action 

The I-box is usually ahead by several instructions during a machine 
run of the E-box. There are certain situations, however, which may 
force the I-box to refer to the same instruction as the E-box: 

1. ’’Wrong branch” recovery, the I-box having made a wrong 
assumption about the path to be chosen on conditional branch. 

2. Interruption. 
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These cases may require the I-box to move backwards in time to 
be in step with the E-box. All changes in the index registers and/or 
index indicators due to the advance processing by the I-box must now 
be undone. 

For conditional branches where the condition is unknown to the 
I-box, the latter assumes the branch to be unsuccessful (to avoid unnec- 
essary invalidation of 1 y,2Y contents) and processes ahead. Steps are 
taken, however, to perform the actual test and to facilitate the alterna- 
tive path to be taken. Altogether four LA levels are used for each such 
branch instruction; three levels not unlike those for a VFL connect-to- 
memory instruction plus a branch recovery level. It is noted that there 
is always a store type level, whether the programmer specifies a change 
of the tested bit or not. For Bind, two of these four levels involve the 
LAAR. 

When the I-box guess proved correct, no particular action is taken 
aside from setting of the tested bit. If, however, the guess proved incor- 
rect by E-box arithmetic, the I-box has already processed ahead, and re- 
covery action has to be taken to ensure the logical integrity of the pro- 
gram. The re-setting of the I-box to the previous state of ten requires 
shipment of index register information back to the I-box. The correct 
instruction counter value is also shipped back to the I-box. 

The interruption action is quite similar to branch recovery except 
that in addition a new instruction has to be fetched and executed. 
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4 

Computational Performance 


MATRIX INVERSION - GERB2 
Problem 

The inversion of Hilbert matrix segments of increasing size, using 
Jordan’s elimination method. In each case the determinant of the matrix 
is evaluated as a by-product. 

Program 

See Volume 2. 

Timing 


10 X 10 matrix 

0.02 

seconds 

20 X 20 matrix 

0.14 

seconds 

30 X 30 matrix 

0.43 

seconds 

40 X 40 matrix 

0.99 

seconds 

50 X 50 matrix 

1.89 

seconds 

60 X 60 matrix 

3.23 

seconds 

70 X 70 matrix 

5.10 

seconds 

80 X 80 matrix 

7.55 

seconds 

90 X 90 matrix 

10.69 

seconds 

100 X 100 matrix 

14.61 

seconds 


Comparison With Other Machines 

The 96K memory allows the convenient inversion of matrices up 
to 300 X 300 without using drums or tapes. The extra word length is 
also an asset in inversion. The present program uses the relatively 
slow *+ {multiply and add) instruction to obtain 96-bit intermediate 
fraction accuracy. 
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Hilbert matrices are extremely ill-conditioned, and no claim is 
made about the accuracy of the 100 x 100 inversion result. For an or- 
dinary matrix of size beyond 40 x 40, it is probably fair to say 27 frac- 
tion bits (as in the 7090) would not be adequate. Double precision cost 
is a 6-fold decrease in speed on the 7090. The 7030 with 48 fraction bits 
is adequate for a much larger range. 


Additional Remarks 

There is a faster matrix inversion program which gives the follow- 
ing results: 


50 X 50 matrix 
100 x 100 matrix 
150 X 150 matrix 
200 X 200 matrix 
250 X 250 matrix 
300 X 300 matrix 


1.1 seconds 
10 seconds 
31 seconds 
79 seconds 
144 seconds 
250 seconds 


A 128 X 128 linear equation program with two sets of unknowns requires 
8.6 seconds. 


There is also a double precision version of GERB2, called GERB3, 
with the following speeds: 


10 X 10 matrix 
20 X 20 matrix 
30 X 30 matrix 
40 X 40 matrix 
50 X 50 matrix 


0.05 seconds 
0.41 seconds 
1.36 seconds 
3.19 seconds 
6.21 seconds 


It is seen that the doxible precision computation time is rou^ly triple 
that of single precision. 7030 double precision is almost equivalent to 
quadruple precision on the 7090 in terms of the munber of fraction bits. 
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MATRIX MULTIPLICATION - MXM16 


Problem 

Multiplication of two nxn matrices. 


Program 

Listing (including test program) are included in Voliune 2. 

Assume n = 17 k+m. A "major” inner loop is traversed k times, 
then a short inner loop is traversed m times, for each vector multiplica- 
tion. 

The matrix elements used for the test are all equal to normalized 
floating point 1.0 and are placed in the upper memory (32768.0 and beyond). 


Timing 

25 X 
50 X 
75 X 
100 X 
125 X 


25 matrices 
50 matrices 
75 matrices 
100 matrices 
125 matrices 


0.20 seconds 
1.35 seconds 
4.74 seconds 
10.71 seconds 
21.44 seconds 


Additional Remarks 

A simple version (taken directly from the 7030 programming ex- 
ample book) requires 15.5 seconds for 100 x 100 matrices. The timing 
cost was traceable to the use of the more accurate but relatively slow 
LFT; *+ sequence and the fact that two adjacent I-box instructions are 
executed for each traversal of the inner loop. 

In the present program the ratio of I-box instructions to arithmetic 
instruction is greatly reduced, but no other attempt has been made to 
speed up the program. 
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PREME NUMBER GENERATION PROGRAM PRIMC 


Problem 

To generate prime numbers by the sieve of Eratosthenes, using 
VFL arithmetic and automatic program interruption. 

In a very large memory segment (octal 17740.40 through octal 
272777.63), consecutive bits represent consecutive odd integers. The 
bit at distance d from the beginning of the string thvis represents the 
odd number 2d + 1. In the beginning all bits in the string are set to I’s. 

A working prime P is represented by a 1 bit to the right of the 
previous working prime. In the beginning the first working prime is 3, 
oc cup ying the second bit of the 

2 

The bits representing the number p + np, n = integer > 0, are 

systematically made zero by what appears to be an infinite loop, starting 

from the case n = 0. When the prescribed upper memory boundary is 

exceeded, an interruption causes exit from the loop. The next working 

prime is then found, and the process is repeated, unless an end condition 

is encountered. The end condition is met when, for a working prime p, 

2 

the bit corresponding to p lies beyond the upper boundary, the non- 
zero bits remainii^ in the interval represent prime numbers if the first 
bit is reinterpreted to represent the even prime number 2. 

Program (See Volume 2) 

It is to be noted that 2/3 of the program consists of an interrup- 
tion table. 

Timing — 106.7 Seconds 

The largest prime is (52,307,665)g, or about 11 million. A slight 
change in the inner loop (replacir^ the progressive indexing by ordinary 
indexing and a ddii^ a V + I instruction) leads to 102.7 seconds. 
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Comparison With Other Machines. 

The 7030 machine can process bits very conveniently. Each 
bit zeroing on the 7030 takes only one instruction. The corresponding 
operation on the 7090 or similar machines requires very careful 
programming and relatively long computation time (about 48 jus), even 

5 

if only 2 = 32 bits are used per word to avoid a divide instruction. 
Also, other machines do not have as many bits in the memory. For 
this problem, the memory capacity of the 7030 is almost six times the 
7090. 


A MONTE CARLO PROGRAM 
Problem 


The physics and method of solution of the problem is described by 
Davis, Journal of Applied Physics , 1960. The original problem was coded 
in Livermore on the 709 and 7090. 


Briefly, the problem is the passage of particles through a right- 
angle bent pipe of circular cross-section at such a low density that only 
wall collisions are important. It is assumed that the angle of rebound 
is random and uncorrelated with the angle of incidence. The two ends 
of the pipe are each divided into 4 areas, and statistics are accumulated 
to determine the distribution of exit areas as a function of entry areas. 

The calculation on the 7090 differs in one respect to that de- 
scribed in the paper. The calculation of random input angles and re- 
bound angles follows the "cosine law". The choice of these angles in 
the 7030 code differs from the 709 in method but not in result. 


A point is chosen from a uniform distribution in the rectangular 
parallelepiped (-0.7, 0.7) x (-0.7, 0.7) x (0.0, 1.0). Unless 

(x^ + y^ + z^) < z. 
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the point is discarded and a new one is tried. If the inequality is 
satisfied the vector is used as a velocity vecotr relative to the X-Y 
wall. 

Program 

See Volume 2. 

Timing 

10,000 particles were rim and the measured time was 33 seconds. 
This is to be contrasted with a known 709 run of 5000 particles in 10 
minutes . 

For the machine run 20 cards had to be loaded and 36 numbers 
were printed on line. 

WEATHER FORECASTING STUDY 

The following weather forecasting study summarizes the results, 
to date, of timing experiments performed on STRETCH. 

Inner Loop 

This program steps in time one floatii^-point meteorological vari- 
able at one groimd point (i, j) of grid, and four floating-point meteorological 
variables at each vertical k-levei above the ground point. The equations 
are non-linear integro-differential. 

1. First form (AO, see below), lower memory: 6.4 ms. 

2. Pseudo j-level (see below), lower memory: .7 ms. 

3. AO, upper memory: 6.13 ms. 

4. AO, lower (instructions) & upper (data) memory: 5.98 ms. 
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5. Second form (Al, see below), lower (instructions) and upper 
(data) memory: 5.57 ms. 

6. Al Modified (see below), lower (instructions) and upper (data) 
memory: 7.045 ms. 

7. Third form (A2, see below), lower (instructions) and upper 
(data) memory; 7.0135 ms. 

The above times for runs 1, 3, 4, 5 are for one pseudo j -level, 
two i-levels, and three k-levels per i-level. The time for run No. 2 is 
for one pseudo- j -level only, and the times for runs Nos. 6 and 7 are for 
greatly reduced pseudo-j -level, one i-level, and nine k-levels. 

AO; Coding prior to any changes for timing improvement. 

Al; Same as AO except for coding changes to improve timing 
with respect to accumulate multiply, division, add to 
memory, separation of indexing instructions, loop entry, 
CNOP, one-word transmits, and associated minor changes. 

A2; Same as Al Modified except for additional coding changes 
to improve timing with respect to spacing of store instruc- 
tions, and more-than-one-word transmits. 

Summary 

With respect to run No. 1, the above times provide the following 
approximate improvements: Run No. 3 - 4.2 percent; run No. 4 - 6.6 
percent; run No. 5-13 percent. With respect to rim No. 6, run No. 7 
provides an approximate improvement of .4 percent. This last suggests 
that this program is insensitive to the space of the store instruction and 
the multi-word transmit. 
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Grid Parameters 


The execution of this program constitutes a neglible part of execu- 
tion time required for the Weather program. Also, the program performs 
a very specialized function, arising from the particular grid-geometry 
and interpolation chosen for the Weather Project. Consequently, no coding 
chaises in grid parameters were made for purposes of timing improve- 
ment or further study. Grid parameters computes, as a function of the 
grid size (parameter N), certain grid measurements and interpolation 
data (VFL and floating-point). 

Using N = 3, lower memory: 7.07 ms 

Elementary Function Test 

X Inx 

ln(e ) is compared against x, then a polynomial evaluation of e 

is compared against x. The evaluations are based on 8-decimal accuracy 

subroutines, employing polynomial methods. The program uses lower 

memory exclusively. 

In the following chart, certain similar operations have been 
grouped together (e.g., / and R/; V+, V-, V+I, V-I; +, M+ under FI. Pt.). 
Also, the time given for AO is that for the more favorable memory 
arrangement (i.e., instructions in lower, data in upper). 
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Pseudo 




Grid 

Elem. 

Functions 

Operation 

AO 

j -level 

A1 

Modified 

A2 

Param. 

1st Form 

2nd Form 

LFT 

144 





39 



L 

257 

1 

437 

633 

651 

243 

7 

7 

ST 

261 

1 

465 

659 

677 

202 

15 

15 

+ 

250 


394 

581 

581 

195 

28 

28 

♦ 

220 


434 

637 

637 

187 

26 

26 

/ 

88 


42 

51 

51 

42 

2 

2 

FI. Pt. i 









*+ 

144 





39 



D* 







2 


K 






45 

2 

2 

SHF 







2 


E-f 






69 

4 

4 

SRT 






4 



Total 

1364 

2 

1772 

2561 

2597 

1065 

88 

84 

L 

10 

8 

10 

9 

9 

128 

4 

2 

ST 

2 


2 

1 

1 

93 

6 

2 

+ 

20 

18 

20 

19 

19 

20 



CM 

66 

66 

70 

71 

71 

9 









8 



VFL 

vri. 






4 



♦+ 






4 



-MG 






4 



M+l 






37 



K 






64 



Total 

98 

92 

102 

100 

100 

371 

10 

4 

LZ,LV,LC,SV 

13 

9 

13 

11 

11 

6 




60 


60 

78 

78 

117 



Indexing 

15 

1 

15 

20 

20 

13 



KCl 

12 


12 

18 

18 




Total 

100 

10 

100 

127 

127 

136 

0 

0 



b,bd,bew 

77 

5 

68 

91 

91 

6 

8 

10 

Branching 

BIND 

18 


18 

27 

27 

274 

3 

3 


BB 

2 


2 

1 

1 

1 




Total 

97 

5 

88 

119 

119 

280 

11 

13 


Z,TI 









Miscell. 

Total 

52 


28 

38 

29 

31 

1 

1 

Grand Total 


1711 

109 

2090 

2945 

2972 

1883 

114 

106 

Execution Time (ms) 

5.98 

.7 

5.57 

7.045 

7.0135 

7.07 

.38 

.309 

Aver, time (/is)/opn 

3.5 

6.4 

2.7 

2.4 

2.36 

3.8 

3.3 

2.9 
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Reliability and Serviceability 


The capability of a large computer complex to satisfy operational 
requirements miist extend beyond the scope of engineering design and 
programming flexibility. Maintainability, achieved through the applica- 
tion of reliability and serviceability principles during the design stage 
and the plannii^ of maintenance procedures, is a key to maximum sys- 
tem availability. Among the special features of STRETCH are extensive 
error-checking and error-correcting circuitry, test panels, marginal 
(bias) checking, and automatic scanning and recording. Maintenance 
aids include diagnostic programs for testing and maintaining the equip- 
ment and system, selection and training of field maintenance personnel, 
unique distribution and function of spare parts depots, comprehensive 
maintenance manuals, and applied programming support. 


PERFORMANCE CHARACTERISTICS 

Based on statistical analyses of this class of large scale computer 
systems and correlation of the analyses with an operating 7030 system, it 
is estimated that the typical 7030 Data Processing System will have the 
following operating characteristics within 19 months after installation 
of the first system and provided the system on which they are measured 
has been installed for at least 6 months. These estimates are based 
upon the best available information on component failure rates, and 
best estimates of the impact of the automatic error correction featxire. 


7030 System Availability for Acceptance Purposes 

Availability for acceptance purposes is indirectly based on a specific 
allotment of time for preventive maintenance during each 24 hour period 
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of operation. It is defined as the ratio of productive customer hours to 
scheduled customer hours (defined as that portion of a regularly sched- 
uled shift assigned to customer operation.) 


Availability = 


(Scheduled Customer Hours) - (Unscheduled Maintenance Hours) 
Scheduled Customer Hours 


Average Time Between Unscheduled Maintenance Periods 

On the average, it is estimated that 6 or more hours of customer 
operation will normally occur between unscheduled maintenance calls. 
This suggests one unscheduled call per operating shift. 


Dimation of Unscheduled Maintenance Calls 

The average duration of an unscheduled maintenance call is 
estimated to be 1.3 hours. Ninety percent of the unscheduled calls can 
be expected to be less than 2.5 hours in duration. 


Scheduled Maintenance 

Based on purely technical equipment requirements, the duration 
of a scheduled maintenance period will be less than four ho\irs. Twice 
each calendar year, a 24 hour period must be reserved for scheduled 
maintenance. There should be 16 or more hours of customer operation 
between scheduled maintenance periods. On a regularly planned basis, 
a four hour period of scheduled maintenance is required during every 
24 hour operatir^ period. 


DIAGNOSTIC PROGRAMMING 

Automatic diagnostic programming, the most advanced technique 
devised for rapidly isolating difficult and intermittent system incom- 
patibilities, has been scientifically approached in SAGE I, refined for 
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other large scale computer systems, and is an integral part of the 
7030 Data Processing System. Rapid isolation of malfunctions to a 
limited area as diagnosed automatically by the computer itself and 
multiple replacement substantially increase system efficiency. 

Good diagnostic programming provides three major advantages: 

1. A means of computer analysis which maximizes the success 
of customer machine program performance. 

2. A means of isolating failures faster to maximize customer 
machine time availability. 

3. A means for establishing an intimate man-to- machine re- 
lationship. 

Diagnostic programs are written with reference to the machine 
logic, as well as with reference to machine specifications. Thus these 
programs ensure that all hardware is tested and that the machine will 
operate properly with worst-case patterns and timing relationships. 

Diagnostic testing under program control proceeds with the building 
block technique where possible. This is a technique which starts by 
testing the smallest amount of hardware, and then gradually adds sub- 
sequent tests involvir^ the minimum increment of additional circuitry 
possible. In this fashion a failure at any one point is isolated to those 
logical blocks added in the failing test. Testing continues until all 
logical blocks are covered. At this point, worst patterns and timing 
relationships are introduced to test for electronic interactions between 
areas of equipment. 

Diagnostic programs are in use to test and maintain the present 
7030 systems. In addition, a SEVA (Systems Evaluation) program will 
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provide a comprehensive over-all test of the 7030 System under con- 
ditions similar to those expected for an operational program. A brief 
description of these diagnostic programs is presented below followed 
by a description of the SEVA program. 

• Diagnostic Control Program — This is an executive program 
to control the r\mning of all diagnostic programs providing 
standard options and common utility routines. 

• I-Box - Tests all the controls and transfer paths necessary 
for the execution of instructions in the instruction unit. 

• SAU — Tests the transfer paths, arithmetic elements, and 
controls of the serial arithmetic unit. 

• PAU — Tests the transfer paths, arithmetic elements, and 
controls in the parallel arithmetic unit. 

• Look Ahead — Tests the transfer paths and controls of the 
look-ahead unit. 

• JVIemory— Tests the memories and memory bus, including a 
worst-patterns test. 

• CPU Scan and Miscellaneous - Tests the central processor 
unit (CPU) scan circuitry, clocks, and boundary registers. 

• BX 0 — This program provides for Kianually testing basic 
fxmctions of the exchange without using the central processor. 

• BX 1 — Tests those portions of the exchange which can be 
tested without using any I/O devices. 

• BX 2 - Tests the common circuitry in the exchange usir^ 
available devices which can be selected by the operator. 

• BX 3 — Tests the Exchange under worst conditions of simul- 
taneous operation. 
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• Tapes — Tests the tape adapters and tape drives, including the 
control, information transfer, and timing circuitry. 

• Printer — Tests the printer control unit and the chain printer. 
Carefully selected patterns will be printed for the operator to 
verify results. 

• Console — Tests the console control unit and the operator's 
console. The patterns t 3 ^ed on the typewriter are verified by 
the operator, while the operator's inputs are verified by the 
computer. 

• Reader, Punch — This program provides a punch to reader 
loop for testing the reader, reader control xmit, punch, and 
punch control unit. 

• High-Speed Exchange — This program utilizes the disk file in 
an elementary fashion to test the disk synchronizer. 

• Disk — Tests the disk and includes worst patterns and timing 
tests. 

• I/O Scan — Tests the circuitry associated with the exchange 
scan and the disk scan. 

SEVA Program 

The 7030 SEVA (Systems Evaluation) program is designed 
primarily to test the interrupt, asynchronous capabilities, and capacity 
of the 7030 system in a manner similar to that in which a customer will 
operate the system. SEVA is made up of a number of analytical and 
mathenaatical checking routines and is designed so that the I/O routines 
will cycle independently and concurrently of each other and of the cen- 
tral processor ainit routines. Because of the asynchronous operation of 
the 7030 system, phase shifts will occur, producing an almost infinite 
variety of timing conditions within the system. During operation. 
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progress of the SEVA program, required operation action, and incon- 
sistencies are printed out by the printer for evaluation and analysis 
purposes. 

The SEVA program will contain three general types of routines: 
control routines, central processor unit routines, and input-output 
routines. Figxire 1 illustrates the inter-relationship between the 
various routines. A brief description of these routines follows: 

• Initiator — This routine establishes initial conditions, as re- 
quired, for all of the other routines. 

• I/O Initiator - This routine operates with the initiator routine 
to start from one to eight tape routines and one disk routine 
and is used only when first starting or restarting the over-all 
SEVA program. 

• Central Processor Unit Routine Sequencer — This routine 
establishes the order in which the other central processor 
imit and memory routines will be run. After each pass 
through all of the central processor unit routines, the order 
is permutated so that the central processor unit routines will 
not be run in the same order on each successive pass. 

• Central Processor Unit Routines — Central processor unit 
routines will calculate mathematical problems such as 
binomial expansions and solutions of quadratic equations. 
These routines will contain four sections to accommodate 
both variable field length and floatii^ point calculation, using 
different techniques so that the results can be compared. 
Provision is made within the SEVA program to add other 
routines (indicated as variable routines in figure 1) as 
desired. 
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START 



Figure 1. 7030 SEVA Program 
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• Memory Routines — Memory routines will perform instruction 
fetch (IF) and data store (DS) functions in all available core 
storage units. Independently, the result will be generated in a 
fixed memory area for comparison to the other result gen- 
erated by this routine. 

• Tape Routines — There are eight identical tape routines one 
for each tape adaptor, and they will exercise every possible 
operation. 

• Disk Routine — This routine is similar to the tape routines in 
that it will exercise all possible disk operations. 

• Typewriter, Punch, Reader Loop Routine — With this routine, 
data entered throv^h the typewriter will be typed and punched. 
Pimched data from the reader will be typed again to verify 
results. 

• Input-Output Terminator — When a tape adaptor or disk has 
completed a routine, the input- output terminator routine will 
temporarily delete the equipment from the over-all SEVA and 
start another tape adaptor or disk in its place. When available, 
the disk and eight tapes will be running simultaneously with the 
central processor unit routine. 

• Interrupt Control — This routine will handle all interrupts. If 
two error interrupts occur within a 30-second period, the pro- 
gram will be dumped. For single error interrupts within a 

30 -second period, the error will be logged and the operation 
will continue. 
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Appendix A 


7030 CPU CONFIGURATION 
Main Memory 

The main memory, sometimes called extended memory, is com- 
posed of six boxes of 16,384 extended words each (for the Los Alamos 
configuration). Each extended word contains 64 information bits plus 
8 error -check bits. 

The main memory is controlled by the memory bus control unit 
(MBCU) which, in addition to initiating all accesses to main memory, 
also monitors the submitted addresses for the Address Invalid (AD) 
condition. The MBCU unit has direct contact with the following units: 
instruction unit, Look-Ahead, memory boxes, exchange, and disk 
exchange. 

Instruction Unit 

The instruction unit (I-box) contains the instruction counter (IC), 
the 16 index registers ($0-S15), the time clock ($TC) and interval timer 
(SIT), the ’’originals” of the index condition indicators (SXF, SXVLZ, 
SXVZ, SXLGZ, SXCZ, SXL, SXE, SXH), and many registers and circuits 
needed for efficient decoding and execution of instructions. The I-box 
executes the following activities: 

• Generates all instruction-fetch requests on the basis of IC 
contents. 

• Develops effective addresses by adding the pertinent index 
value to the numerical address. 

• Generates E-box operand requests for look-ahead. 

• Partially decodes E-box instructions and converts the latter 
into information suitable for look-ahead processing. This 
information is then loaded into look-ahead. 
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• Decodes and executes all index arithmetic instructions as well 

as the followii^: R, RCS, EX, EXIC, T, SWAP, except that 

Stores are performed with the help of look-ahead. If non-I-box 
operands are required, they will be fetched from the MBCU or 
the look-ahead. 

• Decodes and executes the following branch instructions: B, BE, 
BD, BR, BEW, CB, CBR, and Bindior non-index conditions 
require the assistance of the look-ahead and the E-box. 

• Submits indicator conditions for the following indicators to 
look-ahead for updatir^ of the indicator register: SlI J,$OP, 

SAD (from MBCU), SDS, SDF, SIF + index indicators. These 
indicators plus a conditional machine check may lead to inter- 
ruption durii^ the updating. Other I-box-generated indicators 
are gated directly to the indicator register. 

• Updates STC and SIT every 1/1024 second. 

Look-Ahead (LA) 

Look-ahead contains four buffer levels, an address register (LAAR) 
and five coimters, lAUC (I-box), OCC (operand check), TBC (transfer 
bus into E-box), ABC (arithmetic bus and interruption system), and 
see (store check). Look-ahead functions under the following conditions: 

• When lAUC refers to a level, that level may accept I-box 
loading, although the required operand may come later from 
MBCU. 

• During OCC time, the operand from MBCU may be checked for 
error. 

• During TBC time, the level may be shipped into E-box. 

• Durii^ ABC time, the interruption system is updated and 
signal is given to E-box for execution of the instruction just 
loaded. 
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• During SCC time, the store operand (if any) is checked and 
sent to MBCU on the basis of the contents of LAAR. 

• Provides interlocks, plus close contact with the interruption 
system, to ensure smooth, autonomous, and error-free oper- 
ations of the various units in the computer. 

Interrupt System 

The interrupt system contains the indicator register (SIND), the 
mask register (SMASK), and SCA and SCPU. It has direct connections 
with I-box, LA, E-box, and the exchange units to receive updated indi- 
cator information, 

• Interruption occurs if: 

a. System is enabled 

b. A masked indicator bit is a 1. 

• Interruption sequence: 

a. The left-most masked indicator bit position (for instance, 
O.K) is noted, 

b. I-box is house-cleaned except the index storage. 

c. LA house-cleaning is performed. Recovery information is 
shipped back to the I-box. This includes all index register 
recovery information and the interrupted IC value. 

d. Contents of SIA is fetched and added to K.O. 

e. Instruction beginning at the address C(SIA) + K.0 is fetched 
from MBCU without disturbing SIF indicator. 

f. The ’’free" instruction is performed with all masked inter- 
ruption conditions enforced. This instruction may or may 
not alter IC in I-box. 

g. I-box fetches new instructions on the basis of IC, 

h. Resumption of normal operations. 
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E-Box (Arithmetic Unit) 

The E-box contains a parallel arithmetic unit (PAU) for floating- 
point fraction operations, as well as all executed /, *+, and binary- 
decimal conversions. It contains a serial arithmetic unit (SAU) for 
variable field lei^th operations (except *,/,*+ and conversions), as 
well as for floating point exponent arithmetic. 

The E-box: 

• Receives instructions and operands from LA for decoding and 
execution. 

• Submits store operands to LA. 

• Submits arithmetic indicator bits to the interrupt system. 

The E-box also contains the following registers: 

1. Accumxilator (SL, SR) and sign byte register (SSB). 

2. Buffer registers C, D. 

3. PAU bxiffer register F. 

4. Left zero counter (SLZC) and all ones counter (SAOC). 

Exchange 

The exchange contains up to 32 channels (32-63) for simultaneous I/O 
processing. Through adaptors each channel can be connected to eight tape 
units or with one non-tape I/O unit. Each channel is represented by one 
control word and two data words in-the exchai^e memory. The chaimels 
commxmicate with the change memory thru a multiplexer. 

The exchai^e unit contains a main memory address register 
(MMAR) and a buffer register to commxmicate with the MBCU. It also 
contains an interruption address for the chaimel which has created an 
interrupt condition, as well as triggers (EOP, UK, EK, EE, and CS) to 
indicate the reason for interrupt. These triggers and lAR contents are 
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set until the interruption system accepts the conditions. When lAR is 
busy (Interrupt Wait trigger on), other channels cannot use it to cause 
other interrupts. 

The exchange unit: 

• Accepts I/O instructions from LA (2 levels per instruction). 

• Fetches and stores control words and data words directly from 
MBCU, subject to SAD restrictions but not SDF and SDS since 
these are performed by the I-Box. 

• Communicates with I/O imits (thru adaptors and multiplexer) 
in 8 -bit b3^es. 

• Has its own clockii^ circuit (1.0 us cycles divided into 10 
equal piilses) and ECC check-bit generator -comparer. 

Maximum word rate is 1 extended word/10 ixs for the entire 
exchange. 

The disk exchange contains 32 channels (0-31), only one of which 
can be in operation at any given time. It contains enough memory for 1 
control word and 1 data word, and each channel can be attached to one 
disk unit. Disk word rate is 1 extended word/8 jus. No direct chainii^ 
of the I/O control words is permitted. In addition, the copy-control- 
word operation caimot be performed when reading or writii^. Otherwise 
the disk exchange functions in much the same way as the exchange. 

LIST OF IMPORTANT REGISTERS IN THE 7030 
I-Box 


Table A- 1 indicates important I-box registers in the 7030. 
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TABLE A-1. I-BOX REGISTERS 


Code 

Name 

Bit Information 

Remarks 

lY* 

Instruction buffer 

64 + check bits 

Even-addressed full 
words 

2Y* 

Instruction buffer 

64 + check bits 

Odd-addressed full 
words 

Z 

Instruction prep- 
aration and exec- 
ution register 

64 + check bits 


xs 

Index register 

17 words, each 
with 64 bits + 
check bits 

Contains location 1.0, 
16.0-31.0 

X 

Index data regis- 
ter 

64 + check bits 

Buffer register for XS 

X-A*dder 

bidex Adder 

32 + check bits 

Capable of 24-bit 
additions 

W 

Work register 

18 + check bits 

Serves miscellaneous 
functions in I-box; LVS 
address decoding; second 
operand address in VFL; 
refill and interruption 
address; count for T and 
SWAP; LVS address de- 
coding. 

IC 

Instruction counter 

19 + check bit 


GLAR 

Left^eros counter 
for LYS instruc- 
tion execution 


Geometric-load address 
counter 

Originals 

of; SXF, 

SXVLZ, 

SXVZ, 

axvGg, 

sxca, 

SXL,SXE, 

SXH 





*Both lY and 2Y may be used as I-box operand buffer 
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I-Checker (shared between I-box and Look-Ahead) 

Look-Ahead (LA) 

Look-Ahead buffer levels, each has: 

Op code field (10 bits + check) 

Operand field (64 bits + checks) 

Indicator bit field (15 bits) 

Instruction counter field (19 bits + checks) 

plus these bits for each buffer level: 

NOOP no-op bit 

WBC word boundary crossover bit 
LAOP LA op code bit 

IC Instruction counter bit 

INT Internal fetch bit 

LC Level checked bit 

LF Level filled bit 

FF ” Forward from ’ ' bit 

DISC Disconnect bit 

LAAR Look-Ahead address register (18 + check) 

LAAR-Busy bit 
Store-executed bit 
Forward-cycle-required bit 
IC buffer (19 + checks) 

Counters: lAUC (Instruction-arithmetic unit counter) For LA 

(4 bit rii^s) loading from I-box 

OCC (Operand check counter). For check of opnd 
arrived from MBCU 

TBC (Transfer bus counter). For loading of E-box. 

ABC (Arithmetic bus counter). For interrupt system 
updating, internal opnd fetch. 

see (Store check counter). For storii^ into main 
memory 

Interruption System 

SIND - Indicator register. 


LAO 

LAI 

LA2 

LA3 
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SMASK - Mask register. 

SCPUS - Other CPU. (Not available for LASL and BuShips systems) 
$CA - Channel address register. 

Left zeros counter to handle interrupts. 

E-box 


SL, SR 
SSB 
C, D 
SLZC 
SAOC 
SAU 
SAU 
SAU 
PAU 
PAU 
PAU 


Accumulator. 

Sign byte register. 

Operand buffer register (each 64 + check bits). 
Left zeros coxmter. 

All ones counter. 

Serial arithmetic unit 
decoder 

arithmetic - logical unit 
Parallel arithmetic unit 
decoder 

arithmetic - logical unit: 


PAU - adder 
PAU - multiplier 
(PAU) - F-register 


Exchange 

Exchange storage (EM) 

EMAR. Exchange memory address register. (7 bits + check) 

Word register (communicates with EM) (76 bits) 

MMAR. Main memory address register (18 + check bits) for dealing 
with MBCU Buffer register (72 bits) to handle traffic with MBCU 

Internipt address register (7 + check) to contain address of inter- 
rupting channel. 

Interrupt triggers for 5 exchai^e interrupt conditions. 

Interrupt wait bit. 

Multiplexer for dealir^ with individual channels. 

ECC generator and comparing circuits. 
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TABLE A-2. SUMMARY OF 7030 INSTRUCTION EXECUTION* 


Operation Code 

I -box 
opnd fetch 

I -box 
execution 

opnd 
fetch 
for LA 

LA 

levels 

LAAR 

needed? 

LAAR 
not needed 
but used if 
not bury 

SAU 

PAU 

Comments 

LX, LV. LC, LR, 

y+, v+c 

V^CMCdO), V±1CR{C=0) 

R (index), RCB (index, C = 0) 

yes 

yes 

no 

IX Rec. 

no 

no 

no 

no 

V+CR (c=0) involves 2 I-box 
fetches. 

LVL LVNL LCl, LRL C±1 

V±l, V±1C, V±1CR (Cr^O) 

RCZ (C/0) (index) 

no 

yes 

no 

rx Rec. 

no 

no 

no 

no 


KV, KC 

yes 

yes 

no 

NOOP 

no 

no 

no 

no 


KVl, KCI 

no 

yes 

no 

NOOP 

no 

no 

no 

no 


SX, Z 

no 

partly 

no 

store 

yes 

no 

no 

no 


SU. SC, SR, SUA 

yes 

partly 

no 

store 

yes 

no 

no 

no 

SVA requires extra decoding 

j? (memory) , (memory), 

(C =0) 

2 

partly 

no 

store 

yes 

no 

no 

no 


(memory, C / 0) 

no 

yes 

no 

NOOP 

no 

no 

no 

no 


LVE 

N 

N 

no 

IX Rec. 

no 

no 

no 

no 

extra decoding for each instruction 
fetch 

LVS 

no 

yes 

no 

rx Rec. 

no 

no 

no 

no 

repeated addition of index value 
fields. 

RNX 

yes 

I partly 

no 

test 

no 

no 

no 

no 

LA is pre-cleared by first level. 





store 

yes 

no 








NOOP 

no 

no 




EX (exclusive of subject in- 
struction) (repeated EX 
assumed) 

N 

yes 

no 

NOOP 

no 

no 

no 

no 

extra decoding for each instruction 
fetch. 

EXIC (exclusive of subject 
instruction) (repeated EXIC 
assumed) 

2N 

partly 

no 

N 

stores 

No 

times 

no 

no 

no 

extra decoding for each instruction 
fetch. 

T, SWAP 

N 

partly 

no 

test 

stores 

no 

N times 

no 

no 

no 

no 

LA is pre-cleared by first level. 

Each N is doubled in SWAP 





NOOP 

no 

no 




LA level designation 

INT = internal operand fetch 
INT STORE = internal opnd 
store 

Store = Store 

IX Rec. = index register 
recovery (also 
called psuedo- 
store) 

B Rec. = branch recovery 
LAOP = LA operation 

NOOP = no op; indicator 
transfer only 









LA level designation (cont’d) 

Op = operation code level (VFL) 
opnd = operand level 
op + opnd = op code plus operand 
(usually F.P.) 


(Addresses are assumed to refer to Main Memory unless specified) 
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TABLE A-2. SUMMARY OF 7030 INSTRUCTION EXECUTION (cont’d) 


OpRrAtkm Code 

i-box 

operand fetch 

I-box 

execution 

opnd 
fetch 
for LA 

t,A 

levels 

LAAR 

needed? 

LAAR 
not needed 
but used 
if not bury 

SAU 

PAU 

Comments 

lY, 2Y are cleared to receive 










new instruction for all success- 










fill branches 










B, BR, BE, Ni)P 

no 

yes 

no 

NOOP 

no 

no 

no 

no 






test 






BO 

no 

yes 

no 

test 

no 

no 

no 

no 

LA pre-cleared 





NOOP 






BEW 

no 

partly 

no 

NOOP 

no 

no 

no 

no 






2 test 










levels 

no 

no 




CB, CBR {no refill) 

no 

yes 

no 

IX Rec. 

no 

no 

no 

no 


CBR (refill) 

yes 

yes 

no 

IX Rec 

no 

no 

no 

no 


Bind (XF, XC^. XV tM, XV^, 

no 

yes 

no 

noop 

no 

no 

no 

no 


XVGZ, XL, XE, Xti) 










Bind (non-index conditions) 

no 

partly 

INT 

OP. 

no 

no 

yes 

no 


* 




INT 

yes 

no 








INT. stores 

no 

no 








B Rec. 

no 

no 




BB 

no 

partly 

yes 

OP ; 

no 

no 

yes 







opnd 

no 

yes 








store 

yes 

no 








B Rec. 

no 

no 




SIC B, SIC BR, XK BE, SIC BD 

yes 

partly 

no 

store 

j 

yes 

no 

no 

no 

LA pre-cleared in SIC BD by 2 test levels 

SIC BEW 

yes 

partly 

no 

store . 

yes 

no 

no 

no 



1 



2 test ; 

no 

no 








levels 






SIC CB, SIC CBR (no refill) 

yes 

partly 

no 

IX Rec. 

yes 

no 

no 

no 

SIC store level will not exist if branch 

if branch is taken 









is not taken 





Store 

no 

no 




SIC CBR (if refill) if branch 

2 

partly 

no 

IX Rec. 

yes 

no 

no 

no 

SIC store level will not exist if branch 

is taken 









is not taken 





store 

no 

no 




SIC Bind (index conditions) 

yes 

partly 

no i 

store 

yes 

no 

no 

no 

store level replaced by noop if branch 

if branch is taken 









is not taken 

SIC Bind (non-index conditions) 

yes 

partly 

no 

op 

no 

no 

yes 

no 






INT ! 

yes 

no 








INT store 

no 

no 








B Rec. 

no 

no 








store 

yes 

no 
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TABLE A-2. SUMMARY OF 7030 INSTRUCTION EXECUTION (cont'd) 


Operation Code 

1-box 

operand fete 

1-box 

1 execution 

opnd 
fetch 
for LA 

LA 

levels 

LAAR 
needed ? 

LAAR 
not needed 
but used 
if not bury 

SAU 

PAU 

Comments 

F. P, 










±, L> LWF, ^MG 

no 

no 

yes 

op + opnd 

no 

yes 

exp. 

frac. 


DM DL, DLWhM Dm MG 










K, KMG, KR, KMGR 










/, R/ 










F±, E M 










EM / 

no 

no 

no 

op + opnd 

no 

no 

exp 

frac 


SUE 

no 

no 

no 

op ^ count 

no 

no 

no 

frac 


ST, SL0, SRD, SRT 

no 

no 

no 

store 

yes 

no 

exp 

frac 


M±, MM MG 

no 

no 

yes 

op + opnd 

no 

yes 

exp 

frac 






store 

yes 

no 





no 

no 

2 

op + opnd 

no 

yes 

exp 

frac 






C($FT) 

no 

yes 




LET,D/ 

no 

no 

yes 

op +opnd 

no 

yes 

exp 

frac 






store 

yes 

no 




VFL 










+, L, LWF, M MG 

no 

no 

yes 

op 

no 

no 

yes 

no 

For all VFL operations, if 

K, KMG, KR, KMGR 









opnd crosses over full Word 

KE, KF, KFE, KFR ■ 









boundary, the no. of LA opnd 

C, CT, lev 




opnd 

no 

yes 



and/or store levels is doubled 










Progressive indexing required one 










more IX box. 

♦ (Binary) 

no 

no 

yes 

op 

no 

no 

yes 

yes 






opnd 

no 

yes 




CV, DCV 

no 

no 

no 

op 

no 

no 

yes 

no 






INT 






ST, SRD 

no 

no 

yes 

op 

no 

no 

yes 

no 


CM, MM MG, MMl 














opnd 

no 

yes 








store 

yes 

no 




*+ (Binary) 

no 

no 

2 

op 

no 

no 

yes 

yes 






opnd 

no 

yes 








C(JlFT) 

no 

yes 




EFT, LTRS, LTRCV 

no 

no 

yes 

op 

no 

no 

yes 

no 


♦(Dec.), / (Dec.), ♦+(Dec.) 




opnd 

no 

yes 








store 

yes 

no 




I/O 

no 

no 

no 

op 

no 

no 

no 

no 

LA communicates with exchange 





LAep 

no 

no 



directly 




